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1.  INTRODUCTION 


Smirnov  |I|  defined  a statistic  V as  the  number  of  times  a 

n 

continuous  distribution  F(x)  crosses  the  vertical  steps  of  its 
empirical  distribution  F^(x).  based  on  n independent 


observations  from  a distribution  F(x) 

•t^/2 


Smirnov  showed  that 

.<-2/7 

1 - e 

as  n -*•  <»,  independent  of  F(x).  By  generalizing  this  result  to 


P(V^  < t/n) 


for  t ^ 0 and  P (V^  £ t/n)  -*■  0 for  t < 0 


counting  the  number  of  crossings  of  F(x)  + ,>//n  with  F^(x),  and 
F(x)  ± X//n  with  F (x) , Smirnov  was  able  to  show  that 

_ ‘i  \ 2, 

P(sup  F (x)  - F(x)  £ X//n)  -+■  1 - e and  P(sup  | F (x)  - F(x)  | < 

^22 

X//n)  ^ for  A £ 0,  as  n -*■  «>.  The  purpose  of  this 

paper  is  to  derive  the  exact  distribution  of  and  define  a good- 

ness-of-fit  test  based  on  V . Actually  we  consider  C , which  is  t!ic 

n n 

number  of  crossings  of  F(x)  with  the  horizontal  steps  of  F^(x). 

Note  that  C = V - 1.  The  reason  for  choosing  C is  that  there 
n n n 

is  a simple  relationship  between  and  the  transformed  observations 
F(X^),  which  allows  the  test  to  be  applied  without  recourse  to 
actually  plotting  F(x)  and  Fj^(x)  and  counting  the  number  of  crossing 
The  procedure  is  described  in  section  4. 

2.  T1!F,  DISTRIBUTION  OF  C 


We  disregard  the  steps  F^(x)  = 0 and  F^(x) 


1 and 


consider  the  n - 1 steps  corresponding  to  F^(x)  = i/n  for 


i = 1,  2,  . 


1 and  seek  P(C^  = k)  for  k = 0,  1, 


n - 1 


If  there  are  exactly  i observations  less  than  F ^(i/n),  then  the  i— 
step  will  cross  F(x).  Conversely,  if  the  i—  step  crosses,  there 


1 


must  have  been  exactly  i observations  less  than  F*^(i/n).  Since 
P(X  £ F ^(i/n))  = P(F(X)  < i/n)  = i/n,  we  can  use  the  binomial  dis- 
tribution to  get 

i n - i 


i=l,  2, 


(2.1) 


where  is  defined  as  the  event  that  the  i—  step  crosses  F(x). 
For  l£i  < j £n  -l,we  compute  the  simultaneous  occurrence  of 
and  Ej  by  conditioning  on  E^  as 

P(E.E.)  = PCE.IE.)  P (E.). 


The  i—  step  will  cross,  given  that  E^  has  occurred,  if  and  only  if 
i of  the  j observations  in  {0,  j/n)  are  in  (0,  i/n),  i.e., 


P(E.E.)  = 
1 


n- J 


(2.2) 


Using  the  notation  P(E.)  = P.  and  P(E.E.)  = P..  we  have  P(E.E.)  = 

^ ni  nij  ij 

(^P.)  (pPj^-  Extending  this  notation  we  get,  for  l^i<j  <k£  n-1, 

P...  = P(E.E.E,  ) = P(E.E.  I E,  ) P(E,  ) 
n ijk  1 J k ^ 1 J ' k^  ^ k 


= U-'.J  ( PJ 


k i i ' n k ' 

= (.P.)  (,P.)  ( P.) 
'■  J 1 k J ^n  k 


(2.3) 


This  scheme  can  be  extended  to  the  simultaneous  occurrence  of  any 
k events  Ej,  k = 1,  2 n-1.  We  define 


E p j = 

n ,,i2 


s. 

J 


2 f 1 > 


(2.4) 


where  the  summation  is  taken  over  all  subscripts  1 < i j<i^<  . . . i j <n , 


and  the  indicator  variables 


X.  = 


1 if  the  i—  step  crosses 
0 otherwise 


for  i = 1,  2,  n - 1.  Since  s ^ E 1 + X^(s-l),  we  have 

7T  s ^ = 1 +2-Xjs-l)  +LX.  X.  (s-1) 

i=l  ^ h ^2 

♦ EXjX2...Xn.i(.s-l)"-l  , 

where  the  summations  are  taken  over  the  ranges  as  in  (2.4).  Since 
n- 1 

^ X^,  we  have  the  probability  generating  function  of  C as 


/n-1  X. 
P(s)  = E(  7T  s ^ 
\i=l 


n- 1 . 

1 + E (s-l)^S. 
j = l J 


(2.SJ 


because  E(X.  X.  ...X.  ) = P.  

If  ij  n ij 

Note  that  we  can  write 

P.  = Hi  ii  Cn-i)^~" 

" ^ n*^  i!  (n-i)  ! 


and,  in  general, 


1 . .•  i o - i 


12  j „n  n:,T  (Ij-ij)! 


♦ * i.  ^ 1 1 

1,-1, 'I  2 1 ,n-i. 


n - 1 


^n-i  ■)  ! 


s.  - L Pi  i ....  = ^ ^ , 

} n iii2  ij  „n  X.! 

where  this  summation  is  taken  over  Z-X2=n  and  A.  > 1. 

i=l  ^ ^ “ 

-3- 


( 2 . h) 


I 


We  calculate  the  uncons t ra i ned  sum 


^ X 


Xj+...+X. 


. , , X. 
j+1  X.  1 

7T 


(2.7) 


i = l i! 

and  use  the  coefficient  of  to  get  S. 


Using  the  Residue  Theorem,  we  have 
= 


(2.8) 


1 27Ti  z 


and 


oo  X,  X 


A,  A T r ^ 

Z — -§  Ei 

^ = 1 A!  27Ti  X = l^ 


dz 


br,- 


xe 


dz 


27Ti 


(z-xe^) 


_ z(x) 


(2.9) 


l-z(x) 


where  z(x)  is  the  solution  to  z = x e^  such  that  1 as  x -*■  fl 


The  coefficient  of  x^  is  obtained  from 


dx  ; 


27Ti  x“  " \l-z(x) j 
and  changing  the  variable  of  integration  from  z to  x,  we  have 


(2.10) 


) n"  2-ni  ''  z"  \l-z) 


dz  = 


n ! 


n"(j-l)!  i=0 


n - i - 1 i , T , 

V ^ (.  n - 2 - 1 1 ! 

i!  Tn- j - 1 -'jV!' 


(2.11) 


Notice  that  = 0 for  j ^ n in  (2.4)  and  (2.11).  Thus  from  (2.5) 

-4- 


r 


we  have 


P(s)  = 1 + L 

j = l J 


n"  27Ti  z"  J = 1 


dz 


VI-  Z/ 


. 1 * 21  lili).  ^ j 


n"  27Ti 


1-  sz 


(2.12) 


We  have 


C(s)  = P(C  <k)s^ 

1-s  k=0  " 


1 nil 


Li  J_  ^ J_  dz  , 

1-s  n 2^1  z 1 - s z 


which  is  the  generating  function  of  the  cumulative  probabilities 
(k) 


Since  C '^(0)  = P(Cj^£k)k!,  we  have 


n'  1 / e"^ 

Ptc„ik)  - 1 - dz 


n 27Ti  z 


= 1 - 


n ! 


(n- 2 - k)  ! n 


and 


P(C  =k)  = 

" n^  *^(n-l-k)  ! 


; 2 . 1 .V 


(2.14) 


Using  Stirling's  formula  and  In(l-x)  a - x - x"' - for  small  x, 
wo  have,  for  k = t/n, 


-R- 


P(Cj^<k)  a 1 - 


n - f k+  2 ) 


n- (k+2) 


exp  (-  (k  + 2)) 


= 1 - exp  ln(l-  ^^)  - (n-(k+2))  1 n -^^)-( k+ 2) 

» 1 - exp  (k+1)  (k+2)j  - 1 - exp  , 


which  is  Smirnov's  result. 


3.  RELATED  RESULTS 

The  characteristic  function  of  may  be  obtained  from  (2.12) 
by  substituting  s = e^^  and  noting  that  S^^  = 0 for  j ^ n.  We  get 


0„(t) . r s„  . 


(3.1) 


and  calculate  the  central  moments  as 


Ecc^,  (i)  i-.y-'' k’-)s,„, 


(3.2) 


where  we  can  use  the  known  result 


y i\)  (-1)^-’^  k"  = 1”  'r  " ' 

\k/  ' Ij  if  r = j 


(3.3) 


-6- 


In  general,  we  get  the  recursion  relation 


n- 1 

Sv  ' Z 


\.l  » ■ >.  2.  . 


where  Sq  = 1 for  all  u. 

The  characteristic  function  of  may  be  calculated  from 


/ itC  \ n-1  n-l-m  , , , itm 

) ■ „io  k?0  ° 


and  by  a change  of  variable  u = m+k,  and  a change  in  the  order  of 


summation  we  get 


0„lt)  ■ 


Using  (3.4),  we  get  the  form 


0^(t)  = 1 . (e  -1)  ^P.  0 - 


,(t). 


(T.'’) 


From  (3.6),  the  mean  and  variance  of  are 


EIC„)  - Sj 


(3.81 


Var(C^)  = Sj^(l-Sj)  + ZS^  > 


and  moments  may  be  calculated  using  (3.5)  to  get 


F.(-c‘)  = ! 

' n-^  r 

1 


u=0  \m=n 


u - m r \ c, 
m S 


A table  of  the  exact  distribution  of  C up  to  n = 30  is 
given  in  table  1.  In  some  goodness -of- f it  tests,  convergence  to 
the  asymptotic  distribution  is  rapid  and  the  asymptotic  distribution 
may  be  used  for  small  sample  sizes.  In  the  case  of  the  Cramer- 
Von  Mises  statistic,  Marshall  [2]  has  shown  that  the  asymptotic 
distribution  may  be  used  for  sample  sizes  of  three  or  four.  In 
the  case  of  , convergence  is  slow  and  it  is  not  until  n > 100  that 
the  exact  and  asymptotic  distributions  become  reasonably  close.  In 
fact,  calculating 


P(C^  + 1 < v'n  t)  = < v/n  t-1)  = P(Cj^  < kl 

-t2/2 

tor  k = 0,  1,  2,  ...,  n-l^and  comparing  these  values  with  1-e  , 

( k + 11^ 

where  t = for  k = 0,  1,  ...,  n-1,  we  find  the  maximum 

n > > > , 

difference  between  the  exact  and  asymptotic  cumulative  distributions 
decreases  as  follows: 


n = 

2 

3 

4 

5 

10 

2 0 3 0 

'•lax  i mum 
d i f f erence 

0.3679 

0.2912 

0.2315 

0.2146 

0.  1469 

0.1013  0.0822 

n 

4 0 

50 

60 

100 

Maximum 
d i f f erence 

0.0708 

0 . 0632 

0.0575 

0 . 0443 

-8- 


4.  A GOODNESS-OF-FIT  TEST 


We  use  as  our  test  statistic  and  note  that  if  the  hypothe- 
sized distribution  F(x)  is  correct  we  expect  to  be  large.  Thus, 
acceptance  regions  coincide  with  large  values  of  C^.  We  divide 
the  interval  (0,1)  into  n equal  cells  of  length  1/n,  The  i—  step 
of  the  empirical  distribution  function  crosses  F(x)  if  and  only  if 
there  are  exactly  i observations  less  than  F ^(i/n).  This  occurs 
if  and  only  if  there  are  exactly  i transformed  observations  F(x)  in 
the  first  i cells  of  length  1/n.  Thus,  if  we  denote  the  cell 
frequencies  as  f ^ will  be  equal  to  the  number  of  times 

ff  + fz  + •••  ~ ^ 2,  ...,  n-1.  (4.1) 

To  apply  the  test  we  need  simply  transform  the  sample  X^,  X^,...,  X^ 
to  F(Xj),  F(X2),...,  3nd  count  according  to  (4.1).  Critical 

points  and  significance  levels  may  be  calculated  from  (2.13). 


-11- 
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